Overview

Dataset statistics

Number of variables10
Number of observations20640
Missing cells207
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.6 MiB
Average record size in memory80.0 B

Variable types

Numeric8
Categorical2

Alerts

total_rooms has a high cardinality: 5832 distinct values High cardinality
longitude is highly correlated with median_house_value and 1 other fieldsHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_bedrooms is highly correlated with population and 1 other fieldsHigh correlation
population is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
households is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with longitude and 2 other fieldsHigh correlation
ocean_proximity is highly correlated with longitude and 1 other fieldsHigh correlation
total_bedrooms has 207 (1.0%) missing values Missing
latitude is highly skewed (γ1 = 59.45812641) Skewed

Reproduction

Analysis started2022-10-20 16:40:41.388164
Analysis finished2022-10-20 16:40:58.528578
Duration17.14 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct844
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5697045
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Negative20640
Negative (%)100.0%
Memory size161.4 KiB
2022-10-20T19:40:58.696201image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.8
median-118.49
Q3-118.01
95-th percentile-117.08
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.003531724
Coefficient of variation (CV)-0.01675618195
Kurtosis-1.330152366
Mean-119.5697045
Median Absolute Deviation (MAD)1.28
Skewness-0.297801208
Sum-2467918.7
Variance4.014139367
MonotonicityNot monotonic
2022-10-20T19:40:58.967504image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.31162
 
0.8%
-118.3160
 
0.8%
-118.29148
 
0.7%
-118.27144
 
0.7%
-118.32142
 
0.7%
-118.28141
 
0.7%
-118.35140
 
0.7%
-118.36138
 
0.7%
-118.19135
 
0.7%
-118.37128
 
0.6%
Other values (834)19202
93.0%
ValueCountFrequency (%)
-124.351
 
< 0.1%
-124.32
 
< 0.1%
-124.271
 
< 0.1%
-124.261
 
< 0.1%
-124.251
 
< 0.1%
-124.233
< 0.1%
-124.221
 
< 0.1%
-124.213
< 0.1%
-124.194
< 0.1%
-124.186
< 0.1%
ValueCountFrequency (%)
-114.311
 
< 0.1%
-114.471
 
< 0.1%
-114.491
 
< 0.1%
-114.551
 
< 0.1%
-114.561
 
< 0.1%
-114.573
< 0.1%
-114.582
< 0.1%
-114.592
< 0.1%
-114.63
< 0.1%
-114.613
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct863
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.64834399
Minimum32.54
Maximum378
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2022-10-20T19:40:59.242822image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.26
Q337.71
95-th percentile38.96
Maximum378
Range345.46
Interquartile range (IQR)3.78

Descriptive statistics

Standard deviation3.20017739
Coefficient of variation (CV)0.08977071671
Kurtosis6345.285734
Mean35.64834399
Median Absolute Deviation (MAD)1.23
Skewness59.45812641
Sum735781.82
Variance10.24113533
MonotonicityNot monotonic
2022-10-20T19:40:59.515097image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.06244
 
1.2%
34.05236
 
1.1%
34.08234
 
1.1%
34.07231
 
1.1%
34.04221
 
1.1%
34.09212
 
1.0%
34.02208
 
1.0%
34.1203
 
1.0%
34.03193
 
0.9%
33.93181
 
0.9%
Other values (853)18477
89.5%
ValueCountFrequency (%)
32.541
 
< 0.1%
32.553
 
< 0.1%
32.5610
 
< 0.1%
32.5718
0.1%
32.5826
0.1%
32.5911
0.1%
32.69
 
< 0.1%
32.6114
0.1%
32.6213
0.1%
32.6318
0.1%
ValueCountFrequency (%)
3781
 
< 0.1%
41.952
< 0.1%
41.921
 
< 0.1%
41.881
 
< 0.1%
41.863
< 0.1%
41.841
 
< 0.1%
41.821
 
< 0.1%
41.812
< 0.1%
41.83
< 0.1%
41.791
 
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct53
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.65692829
Minimum1
Maximum400
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2022-10-20T19:40:59.797371image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum400
Range399
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.84802111
Coefficient of variation (CV)0.4483390884
Kurtosis32.84630803
Mean28.65692829
Median Absolute Deviation (MAD)10
Skewness1.222725723
Sum591479
Variance165.0716464
MonotonicityNot monotonic
2022-10-20T19:41:00.131482image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521273
 
6.2%
36862
 
4.2%
35824
 
4.0%
16771
 
3.7%
17698
 
3.4%
34689
 
3.3%
26619
 
3.0%
33615
 
3.0%
18570
 
2.8%
25566
 
2.7%
Other values (43)13153
63.7%
ValueCountFrequency (%)
14
 
< 0.1%
258
 
0.3%
362
 
0.3%
4191
0.9%
5244
1.2%
6160
0.8%
7175
0.8%
8206
1.0%
9205
1.0%
10264
1.3%
ValueCountFrequency (%)
4001
 
< 0.1%
521273
6.2%
5148
 
0.2%
50136
 
0.7%
49134
 
0.6%
48177
 
0.9%
47198
 
1.0%
46245
 
1.2%
45294
 
1.4%
44356
 
1.7%

total_rooms
Categorical

HIGH CARDINALITY

Distinct5832
Distinct (%)28.3%
Missing0
Missing (%)0.0%
Memory size161.4 KiB
??
 
1018
1527
 
18
1613
 
17
1471
 
15
2127
 
15
Other values (5827)
19557 

Length

Max length5
Median length4
Mean length3.802713178
Min length1

Characters and Unicode

Total characters78488
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2018 ?
Unique (%)9.8%

Sample

1st row880
2nd row7099
3rd row1467
4th row1274
5th row1627

Common Values

ValueCountFrequency (%)
??1018
 
4.9%
152718
 
0.1%
161317
 
0.1%
147115
 
0.1%
212715
 
0.1%
205315
 
0.1%
171715
 
0.1%
170514
 
0.1%
178714
 
0.1%
160714
 
0.1%
Other values (5822)19485
94.4%

Length

2022-10-20T19:41:00.410912image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1018
 
4.9%
152718
 
0.1%
161317
 
0.1%
147115
 
0.1%
212715
 
0.1%
205315
 
0.1%
171715
 
0.1%
165014
 
0.1%
173114
 
0.1%
170314
 
0.1%
Other values (5822)19485
94.4%

Most occurring characters

ValueCountFrequency (%)
113023
16.6%
211259
14.3%
38334
10.6%
47184
9.2%
56548
8.3%
76215
7.9%
66129
7.8%
05946
7.6%
95908
7.5%
85906
7.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number76452
97.4%
Other Punctuation2036
 
2.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
113023
17.0%
211259
14.7%
38334
10.9%
47184
9.4%
56548
8.6%
76215
8.1%
66129
8.0%
05946
7.8%
95908
7.7%
85906
7.7%
Other Punctuation
ValueCountFrequency (%)
?2036
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common78488
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
113023
16.6%
211259
14.3%
38334
10.6%
47184
9.2%
56548
8.3%
76215
7.9%
66129
7.8%
05946
7.6%
95908
7.5%
85906
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII78488
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
113023
16.6%
211259
14.3%
38334
10.6%
47184
9.2%
56548
8.3%
76215
7.9%
66129
7.8%
05946
7.6%
95908
7.5%
85906
7.5%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1923
Distinct (%)9.4%
Missing207
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean537.8705525
Minimum1
Maximum6445
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2022-10-20T19:41:00.728171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile137
Q1296
median435
Q3647
95-th percentile1275.4
Maximum6445
Range6444
Interquartile range (IQR)351

Descriptive statistics

Standard deviation421.3850701
Coefficient of variation (CV)0.7834321252
Kurtosis21.98557506
Mean537.8705525
Median Absolute Deviation (MAD)162
Skewness3.459546332
Sum10990309
Variance177565.3773
MonotonicityNot monotonic
2022-10-20T19:41:01.000662image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28055
 
0.3%
33151
 
0.2%
34550
 
0.2%
34349
 
0.2%
39349
 
0.2%
34848
 
0.2%
39448
 
0.2%
32848
 
0.2%
30947
 
0.2%
27247
 
0.2%
Other values (1913)19941
96.6%
(Missing)207
 
1.0%
ValueCountFrequency (%)
11
 
< 0.1%
22
 
< 0.1%
35
< 0.1%
47
< 0.1%
56
< 0.1%
65
< 0.1%
76
< 0.1%
88
< 0.1%
97
< 0.1%
108
< 0.1%
ValueCountFrequency (%)
64451
< 0.1%
62101
< 0.1%
54711
< 0.1%
54191
< 0.1%
52901
< 0.1%
50331
< 0.1%
50271
< 0.1%
49571
< 0.1%
49521
< 0.1%
48191
< 0.1%

population
Real number (ℝ)

HIGH CORRELATION

Distinct3889
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1425.379942
Minimum-999
Maximum35682
Zeros0
Zeros (%)0.0%
Negative1
Negative (%)< 0.1%
Memory size161.4 KiB
2022-10-20T19:41:01.446651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-999
5-th percentile348
Q1787
median1166
Q31725
95-th percentile3288
Maximum35682
Range36681
Interquartile range (IQR)938

Descriptive statistics

Standard deviation1132.583966
Coefficient of variation (CV)0.794583909
Kurtosis73.52288283
Mean1425.379942
Median Absolute Deviation (MAD)440
Skewness4.934049079
Sum29419842
Variance1282746.44
MonotonicityNot monotonic
2022-10-20T19:41:01.739827image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
89125
 
0.1%
122724
 
0.1%
105224
 
0.1%
85024
 
0.1%
76124
 
0.1%
82523
 
0.1%
78222
 
0.1%
100522
 
0.1%
87221
 
0.1%
99921
 
0.1%
Other values (3879)20410
98.9%
ValueCountFrequency (%)
-9991
 
< 0.1%
31
 
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
84
< 0.1%
92
< 0.1%
111
 
< 0.1%
134
< 0.1%
143
< 0.1%
152
< 0.1%
ValueCountFrequency (%)
356821
< 0.1%
285661
< 0.1%
163051
< 0.1%
161221
< 0.1%
155071
< 0.1%
150371
< 0.1%
132511
< 0.1%
128731
< 0.1%
124271
< 0.1%
122031
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1815
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499.5396802
Minimum1
Maximum6082
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2022-10-20T19:41:02.036070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile125
Q1280
median409
Q3605
95-th percentile1162
Maximum6082
Range6081
Interquartile range (IQR)325

Descriptive statistics

Standard deviation382.3297528
Coefficient of variation (CV)0.7653641301
Kurtosis22.05798806
Mean499.5396802
Median Absolute Deviation (MAD)151
Skewness3.410437712
Sum10310499
Variance146176.0399
MonotonicityNot monotonic
2022-10-20T19:41:02.525800image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30657
 
0.3%
38656
 
0.3%
33556
 
0.3%
28255
 
0.3%
42954
 
0.3%
37553
 
0.3%
28451
 
0.2%
29751
 
0.2%
27850
 
0.2%
34050
 
0.2%
Other values (1805)20107
97.4%
ValueCountFrequency (%)
11
 
< 0.1%
23
 
< 0.1%
34
 
< 0.1%
44
 
< 0.1%
57
< 0.1%
65
< 0.1%
710
< 0.1%
88
< 0.1%
99
< 0.1%
107
< 0.1%
ValueCountFrequency (%)
60821
< 0.1%
53581
< 0.1%
51891
< 0.1%
50501
< 0.1%
49301
< 0.1%
48551
< 0.1%
47691
< 0.1%
46161
< 0.1%
44901
< 0.1%
43721
< 0.1%

median_income
Real number (ℝ)

HIGH CORRELATION

Distinct12929
Distinct (%)62.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.870622563
Minimum-0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Negative1
Negative (%)< 0.1%
Memory size161.4 KiB
2022-10-20T19:41:02.816023image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.4999
5-th percentile1.60057
Q12.5634
median3.5348
Q34.74325
95-th percentile7.300305
Maximum15.0001
Range15.5
Interquartile range (IQR)2.17985

Descriptive statistics

Standard deviation1.89992041
Coefficient of variation (CV)0.490856543
Kurtosis4.951916681
Mean3.870622563
Median Absolute Deviation (MAD)1.0642
Skewness1.646157328
Sum79889.6497
Variance3.609697566
MonotonicityNot monotonic
2022-10-20T19:41:03.102223image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.12549
 
0.2%
15.000149
 
0.2%
2.87546
 
0.2%
2.62544
 
0.2%
4.12544
 
0.2%
3.87541
 
0.2%
338
 
0.2%
3.37538
 
0.2%
437
 
0.2%
3.62537
 
0.2%
Other values (12919)20217
98.0%
ValueCountFrequency (%)
-0.49991
 
< 0.1%
0.499911
0.1%
0.53610
< 0.1%
0.54951
 
< 0.1%
0.64331
 
< 0.1%
0.67751
 
< 0.1%
0.68251
 
< 0.1%
0.68311
 
< 0.1%
0.6961
 
< 0.1%
0.69911
 
< 0.1%
ValueCountFrequency (%)
15.000149
0.2%
152
 
< 0.1%
14.90091
 
< 0.1%
14.58331
 
< 0.1%
14.42191
 
< 0.1%
14.41131
 
< 0.1%
14.29591
 
< 0.1%
14.28671
 
< 0.1%
13.9471
 
< 0.1%
13.85561
 
< 0.1%

median_house_value
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3842
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206855.8169
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2022-10-20T19:41:03.397469image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66200
Q1119600
median179700
Q3264725
95-th percentile489810
Maximum500001
Range485002
Interquartile range (IQR)145125

Descriptive statistics

Standard deviation115395.6159
Coefficient of variation (CV)0.55785531
Kurtosis0.3278702429
Mean206855.8169
Median Absolute Deviation (MAD)68400
Skewness0.9777632739
Sum4269504061
Variance1.331614816 × 1010
MonotonicityNot monotonic
2022-10-20T19:41:03.675689image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001965
 
4.7%
137500122
 
0.6%
162500117
 
0.6%
112500103
 
0.5%
18750093
 
0.5%
22500092
 
0.4%
35000079
 
0.4%
8750078
 
0.4%
27500065
 
0.3%
15000064
 
0.3%
Other values (3832)18862
91.4%
ValueCountFrequency (%)
149994
< 0.1%
175001
 
< 0.1%
225004
< 0.1%
250001
 
< 0.1%
266001
 
< 0.1%
269001
 
< 0.1%
275001
 
< 0.1%
283001
 
< 0.1%
300002
< 0.1%
325004
< 0.1%
ValueCountFrequency (%)
500001965
4.7%
50000027
 
0.1%
4991001
 
< 0.1%
4990001
 
< 0.1%
4988001
 
< 0.1%
4987001
 
< 0.1%
4986001
 
< 0.1%
4984001
 
< 0.1%
4976001
 
< 0.1%
4974001
 
< 0.1%

ocean_proximity
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size161.4 KiB
<1H OCEAN
9136 
INLAND
6551 
NEAR OCEAN
2658 
NEAR BAY
2290 
ISLAND
 
5

Length

Max length10
Median length9
Mean length8.064922481
Min length6

Characters and Unicode

Total characters166460
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNEAR BAY
2nd rowNEAR BAY
3rd rowNEAR BAY
4th rowNEAR BAY
5th rowNEAR BAY

Common Values

ValueCountFrequency (%)
<1H OCEAN9136
44.3%
INLAND6551
31.7%
NEAR OCEAN2658
 
12.9%
NEAR BAY2290
 
11.1%
ISLAND5
 
< 0.1%

Length

2022-10-20T19:41:03.945009image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-20T19:41:04.218239image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
ocean11794
34.0%
1h9136
26.3%
inland6551
18.9%
near4948
14.2%
bay2290
 
6.6%
island5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter134104
80.6%
Space Separator14084
 
8.5%
Math Symbol9136
 
5.5%
Decimal Number9136
 
5.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
Space Separator
ValueCountFrequency (%)
14084
100.0%
Math Symbol
ValueCountFrequency (%)
<9136
100.0%
Decimal Number
ValueCountFrequency (%)
19136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin134104
80.6%
Common32356
 
19.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
Common
ValueCountFrequency (%)
14084
43.5%
<9136
28.2%
19136
28.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII166460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Interactions

2022-10-20T19:40:55.460406image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:42.648797image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:44.408090image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:46.495509image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:48.156069image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:49.698944image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:51.328588image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:53.515246image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:55.708770image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:42.871199image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:44.649445image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:46.745840image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:48.343568image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:49.862506image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:51.512095image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:53.740704image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:55.961065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:43.066679image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:44.859882image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:46.945309image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:48.525084image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:50.038039image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:51.693610image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:53.990039image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:56.237325image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:43.270134image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:45.072314image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:47.154747image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:48.722554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:50.298343image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:51.947002image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:54.255461image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:56.499625image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:43.457633image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:45.268789image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:47.339253image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:48.890106image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:50.501797image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:52.268111image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:54.501841image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:56.762923image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:43.733924image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:45.507154image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:47.533735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:49.124482image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:50.717224image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:52.519439image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:54.762146image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:57.027214image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:43.944332image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:46.131485image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:47.723228image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:49.364839image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:50.943618image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:52.770767image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:55.003500image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:57.292572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:44.209621image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:46.312997image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:47.973558image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:49.541364image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:51.148072image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:53.242629image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-20T19:40:55.234040image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-10-20T19:41:04.449655image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-20T19:41:04.774751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-20T19:41:05.107032image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-20T19:41:05.441138image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-20T19:40:57.645672image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-20T19:40:58.108670image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-10-20T19:40:58.347067image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841880129.03221268.3252452600NEAR BAY
1-122.2237.862170991106.0240111388.3014358500NEAR BAY
2-122.2437.85521467190.04961777.2574352100NEAR BAY
3-122.2537.85521274235.05582195.6431341300NEAR BAY
4-122.2537.85521627280.05652593.8462342200NEAR BAY
5-122.2537.8552919213.04131934.0368269700NEAR BAY
6-122.2537.84522535489.010945143.6591299200NEAR BAY
7-122.2537.84523104687.011576473.1200241400NEAR BAY
8-122.2637.84422555665.012065952.0804226700NEAR BAY
9-122.2537.84523549707.015517143.6912261100NEAR BAY

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
20630-121.3239.29112640505.012574453.5673112000INLAND
20631-121.4039.33152655493.012004323.5179107200INLAND
20632-121.4539.26152319416.010473853.1250115600INLAND
20633-121.5339.19272080412.010823822.549598300INLAND
20634-121.5639.27282332395.010413443.7125116800INLAND
20635-121.0939.48251665374.08453301.560378100INLAND
20636-121.2139.4918697150.03561142.556877100INLAND
20637-121.2239.43172254485.010074331.700092300INLAND
20638-121.3239.43181860409.07413491.867284700INLAND
20639-121.2439.37162785616.013875302.388689400INLAND